Speaker diarization from speech transcripts
نویسندگان
چکیده
The aim of this study is to investigate the use of the linguistic information present in the audio signal to structure broadcast news data, and in particular to associate speaker identities with audio segments. While speaker recognition has been an active area of research for many years, addressing the problem of identifying speakers in huge audio corpora is relatively recent and has been mainly concerned with speaker tracking. The speech transcriptions contain a wealth of linguistic information that is useful for speaker diarization. Patterns which can be used to identify the current, previous or next speaker have been developed based on the analysis of 150 hours of manually transcribed broadcast news data. Each pattern is associated with one or more rules. After validation on the training transcripts, these patterns and rules were tested on an independent data set containing transcripts of 10 hours of broadcasts.
منابع مشابه
Speaker Diarization - “Who Spoke When”
Speaker diarization is the process of annotating an input audio with informationthat attributes temporal regions of the audio signal to their respective sources,which may include both speech and non-speech events. For speech regions, thediarization system also specifies the locations of speaker boundaries and assignrelative speaker labels to each homogeneous segment of speech. I...
متن کاملStudy on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription
In this paper we study a close incorporation of speaker diarization with speaker adaptive speech recognition in our broadcast transcription system. We provide our motivation for utilization of speech transcripts in the diarization process and analyze the effect it yields in terms of diarization performance or computational cost. Further, speaker adaptation performed according to various scenari...
متن کاملTowards Using STT for Broadcast News Speaker Diarization
The aim of this study is to investigate the use of the linguistic information present in the audio signal to structure broadcast news data, and in particular to associate speaker identities with audio segments. While speaker recognition has been an active area of research for many years, addressing the problem of identifying speakers in huge audio corpora is relatively recent and has been mainl...
متن کاملPhoneme background model for information bottleneck based speaker diarization
Acoustic variability of speakers arises due to differences in their vocal tract characteristics. These individual speaker characteristics are reflected in a speech signal when speakers pronounce a given phoneme. The current work hypothesizes that clusters within a phoneme spoken by multiple speakers roughly correspond to different speakers. Based on this hypothesis, a Gaussian mixture model (GM...
متن کاملThe Approach of Speaker Diarization by Gaussian Mixture Model (GMM)
Speaker identification is an important activity in the process of speaker diarization. We need to model the speaker by Gaussian mixture model (GMM) for speaker identification purpose. Large GMM is called as a Universal Background Model (UBM) which is adapted into each speaker model for speaker identification purpose. This paper focuses on speech clustering for speaker diarization. The speaker d...
متن کامل